fix: convergence issue by adding use_inductor=False in vllm compilation_config#1014
fix: convergence issue by adding use_inductor=False in vllm compilation_config#1014
Conversation
|
@ZhiyuLi-Nvidia good find! Can you share performance on larger qwen models as well? Also, please attach the plots to the PR description since not everyone can access internal wandb reports. |
terrykong
left a comment
There was a problem hiding this comment.
nice find @ZhiyuLi-Nvidia !
is it possible to construct a model diagnostic test for this?
https://github.com/NVIDIA-NeMo/RL/tree/main/tools/model_diagnostics
might be helpful for others who are debugging their model run
|
Thank you @parthchadha
Which model do you recommend?
Added the key screenshots. |
…lation_config Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
6883f11 to
b3aae4f
Compare
Let's run qwen 32b from #957 (we can try with 32k osl) |
Good suggestion. Added. |
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
55191aa to
f5bf231
Compare
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
|
@terrykong added output example 2ba5e3e |
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
@parthchadha I kept get OOM in the middle of training. Shall we go back to it once merged or in a more stable state? |
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
1bcb7ae to
dddcbf0
Compare
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
…on_config (NVIDIA-NeMo#1014) Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
…on_config (#1014) Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
…on_config (NVIDIA-NeMo#1014) Signed-off-by: Zhiyu Li <zhiyul@NVIDIA.com>
What does this PR do ?
Closes #998.
Looks like it can be resolved with the compilation flag
{"use_inductor": False}."With this flag, vllm will use the custom CUDA kernels instead of the Triton kernels generated by torch.compile "which might cause numerical issue here.
There's no logprob error spikes in 140 steps and rewards were increasing stably. The speed performance looks similar.
https://wandb.ai/nvidia/grpo-dev-zhiyul/workspace?nw=nwuserzhiyul
Issues
List issues that this PR closes (syntax):
Usage
# Add a code snippet demonstrating how to use thisBefore your PR is "Ready for review"
Pre checks:
Additional Information